To understand deep learning we need to understand kernel learning

نویسندگان

  • Mikhail Belkin
  • Siyuan Ma
  • Soumik Mandal
چکیده

Generalization performance of classifiers in deep learning has recently become a subject of intense study. Deep models, which are typically heavily over-parametrized, tend to fit the training data exactly. Despite this overfitting, they perform well on test data, a phenomenon not yet fully understood. The first point of our paper is that strong performance of overfitted classifiers is not a unique feature of deep learning. Using six real-world and two synthetic datasets, we establish experimentally that kernel classifiers trained to have zero classification error (overfitting) or zero regression error (interpolation) perform very well on test data. We proceed to prove lower bounds on the norm of overfitted solutions for smooth kernels, showing that they increase nearly exponentially with data size. Since most generalization bounds depend polynomially on the norm of the solution, this result implies that they diverge as data increases. Furthermore, the existing bounds do not apply to interpolated classifiers. We also show experimentally that (non-smooth) Laplacian kernels easily fit random labels using a version of SGD, a finding that parallels results recently reported for ReLU neural networks. In contrast, as expected from theory, fitting noisy data requires many more epochs for smooth Gaussian kernels. The observation that the ultimate performance of overfitted Laplacian and Gaussian classifiers on the test is quite similar, suggests that generalization is tied to the properties of the kernel function rather than the optimization process. We see that some key phenomena of deep learning are manifested similarly in kernel methods in the “modern” overfitted regime. We argue that progress on understanding deep learning will be difficult until more analytically tractable “shallow” kernel methods are better understood. The combination of the experimental and theoretical results presented in this paper indicates a need for new theoretical ideas for understanding properties of classical kernel methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of actual and preferred nursing student perception of clinical learning environment in Arak university of medical sciences, 2009

Introduction: Nursing is a practice-based discipline. The clinical field is an essential and irreplaceable resource in preparing student nurse for their professional role. Despite many changes that occur in clinical learning environment (CLE), these environments remains important to nurse training. However, we need to identify the key attributes of a good CLE. It is important we utilize these f...

متن کامل

User’s Interaction with Information through eFront Learning Management System

Background and Aim: In order to comprehension of interactive content and content production standards, and also users interaction with LMSs, and their behavior in dealing with information, the aim of this paper is to examine the users interaction information provided in the eFront application, an open source Learning Management System, by emphasizing SCORM standard. Method: The method that used...

متن کامل

Crop Land Change Monitoring Based on Deep Learning Algorithm Using Multi-temporal Hyperspectral Images

Change detection is done with the purpose of analyzing two or more images of a region that has been obtained at different times which is Generally one of the most important applications of satellite imagery is urban development, environmental inspection, agricultural monitoring, hazard assessment, and natural disaster. The purpose of using deep learning algorithms, in particular, convolutional ...

متن کامل

What Does a TextCNN Learn?

TextCNN, the convolutional neural network for text, is a useful deep learning algorithm for sentence classification tasks such as sentiment analysis and question classification[2]. However, neural networks have long been known as black boxes because interpreting them is a challenging task. Researchers have developed several tools to understand a CNN for image classification by deep visualizatio...

متن کامل

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.01396  شماره 

صفحات  -

تاریخ انتشار 2018